====== statistic() ======

''mixed **statistic**(string //statistic//, array|string //variables//, mixed //option//, [boolean //alldata//])''

The function statistic() can determine specific univariate data from the data record (across all previous questionnaires).


  * //statistic//\\ Which statistic should be calculated?
    * '''count''' -- counts the frequency of the value specified as ''//option//''.
    * '''percent''' -- percentage of the value specified as ''//option//''.
    * '''crosscount''' -- counts the frequency of the joint occurrence of two values in two variables. The two variables should be specified as an array (or separated with a comma), as well as their values that are specified as ''//option//''.
    * '''mode''' -- most commonly occurring value. 
    * '''min''' -- lowest value.
    * '''max''' -- highest value.
    * '''mean''' -- arithmetic mean of the values.
    * '''groupmean''' -- Arithmetic mean of the values of a subgroup defined by //Option//, specified as Sting consisting of variable name and code for the cases to be counted '''AB01=2'''.
    * '''filter''' -- Determines which cases should be used for further calls to the ''statistic()'' function (for details see [[#evaluate_partial_data_sets|down]]).

  * //variables//\\ Determines which variable(s) the statistic should be calculated for. The IDs of the individual variables can be found in the **Variables Overview**. If the statistic requires multiple variables, these can be given as a comma-separated string or as an array.
  * //option//\\ Some statistics call for or allow a third entry which is set with this parameter (see below).
  * //alldata//\\ This entry is optional and determines that all questionnaires be entered into the statistics; not just those that have been completed. 

**Note:** If ''true'' is not explicitly specified for the parameter //alldata//, only completed questionnaires are included when calculating the statistical values.

**Note:** Test data collected during the developing of the questionnaire and pretesting is only included if the current questionnaire is a part of the test as well. If the questionnaire is being carried out as part of the regular data collection, ''statistic()'' only counts data from the regular data collection.

**Note:** The data from the current interview are not considered by ''statistic()''.

**Tip:** The function ''statistic()'' can be used to close the questionnaire after reaching a predefined quota ([[:en:survey:quota]]) and either display a message to further respondents or redirect them to the quota stop link of a panel provider.

**Tip:** If you do not want to count all completed interviews (e.g. if dropouts were redirected to another page using ''[[:en:create:functions:redirect]]''), it makes sense to copy the variable to be counted to a [[:en:create:questions:internal]] further back in the questionnaire.

===== Frequency Count =====

When counting the frequency (''count''), a third argument can be specified: which value the frequency should be determined for. If a third value is not given, the number of valid responses is output. Missing data is not counted. 

For example, in the questionnaire there is a question where the respondent selects their gender (1=female, 2=male, -9=no input). The number of women who entered the third value ''1'' can be determined like so: 

<code php>
$numberwomen = statistic('count', 'SD01', 1);  // frequency of women (1)
$numbermen = statistic('count', 'SD01', 2); // frequency of men (2)
$numbercompleted = statistic('count', 'SD01');    // number of valid data 
$numberall = statistic('count', 'SD01', false, true); // all data records
html('
  <p>So far,'.$numberall.' people
  specified their gender in this survey, but the questionnaire was
  only completed in '.$numbercompleted.' cases.</p>
  <p>The questionnaires completed are made up of '.
  $numberwomen.' women and '.
  $numbermen.' men.</p>
');
question('SD01');  // question about the respondent's gender
</code>


===== Multivariate Frequency =====

The '''crosscount''' statistic counts the cases (like in cross-tabulations) in which multiple variables apply. 

Instead of a single variable, two or more variables are specified as an array or separated with a comma ('',''). The values being counted for each variable are specified as the third parameter //option//. Only cases which have specified the first value for the first variable, the second value for the second variable and so on are counted. 

<code php>
$nYoungFemale = statistic('crosscount', 'SD01,SD02', '2,1');  // variables and values in a list with commas ...
$nGrownFemale = statistic('crosscount', array('SD01','SD02'), array(2,2));  // ... or in arrays
html('
  <p>So far, '.$nYoungFemale.' people have stated in this survey 
  that they are female and in age group 1 (up to 18 years old).
  '.$nGrownFemale.' women stated they were older than 19 years old.</p>
');
question('SD01');  // question about the respondent's gender
question('SD02');  // question about the respondent's age
</code>


===== Valid Percent =====

The output is the percentage of a value within all valid data. The value to be counted must be given as the third argument. 

<code php>
$numberwomen = statistic('percent', 'SD01', 1); // percentage of women
html('
  <p>So far, '.
  $numberwomen.' women have taken part in this survey.</p>
');
question('SD01');  // question about the respondent's gender
</code>


===== Mode: Value that Occurs Most Frequently =====

This returns the value that has been selected most frequently so far. If multiple values have been selected equally often then these are returned separated by a comma. 

As a third argument (in this instance a Boolean), it is possible to specify if invalid values (no answer etc.) should also be counted.

<code php>
$mode = statistic('mode', 'AB01_02', true);
$modes = explode(',', $mode);  // separate multiple values
if (count($modes) > 1) {
  // multiple values stated most frequently
  html('
    <p>Multiple answers were selected equally often.</p>
 ');
} else {
  // answer options text (statistic() only provides the numeric code)
  $text = getValueText('AB01_02', $mode);
  html('
    <p>The most common answer for this question was: '.$text.'.</p>
  ');
}
</code>


===== Min, Max and Mean of the Valid Data =====

The statistics '''min''', '''mean''' und '''max''' only calculate a correct value if numerical values exist for the question. Data in a text input is ignored if it is not a number -- unless is it is specified that invalid values should also be entered into the statistics (''true'') as the third parameter. 

If no valid values are available, 0 is returned as the '''mean'', and the value ''false'' as the ''min'' and ''max''. 

<code php>
$min = statistic('min', 'BB01_03');
$max = statistic('max', 'BB01_03');
$mean = statistic('mean', 'BB01_03');
html('
  <p>The participant has given the programme
  an average rating of '.$mean.' so far.</p>
  <p>The ratings lie between '.$min.' und '.$max.'.</p>
');
</code>

===== Evaluate partial data sets =====

By using ''statistic('filter', ...)'' a filter can be set, which will be applied for all further calls of ''statistic()''.  The second parameter can be //variables// for acceleration (optional), which are needed in subsequent calls.

The number of cases matching the filter is returned. The fourth parameter //AllData// only affects the return value, but not the further counting.

<code php>
// Statistics on female respondents only (SD02 = 1)
// RT variables are loaded immediately to reduce latency 
$n = statistic('filter', array('RT02_01', 'RT02_02', 'RT02_03'), 'SD02==1');
// Mean value of ratings (women only)
$mean1 = statistic('mean', 'RT02_01');
$mean2 = statistic('mean', 'RT02_02');
$mean3 = statistic('mean', 'RT02_03');
</code>

The filter allows common comparison operators (''>'', ''>='', ''<'', ''%%<=%%'', ''!='', ''==''), brackets and and Boolean operators (''AND'', ''&&'', ''OR'', ''||'', ''NOT'', ''!'').

**Note:** Comparisons are only possible between one variable and a constant value (a number or string), e.g. ''SD02==2'', comparisons between two variables (''SD03>SD04'') are not supported.

<code php>
// Statistics only on female respondents (SD02 = 1) aged 35 and over (SD03 >= 35)
$n = statistic('filter', false, '(SD02==1) AND (SD03 >= 35)');
</code>

Besides the variable names you can also use ''QUESTNNR'', ''CASE'' and ''LANGUAGE'' for the filter.

<code php>
// Statistics only on female respondents (SD02 = 1) aged 35 and over (SD03 >= 35) in the German language version
$n = statistic('filter', false, '(SD02==1) AND (SD03 >= 35) AND (LANGUAGE == "ger")');
</code>